63 research outputs found
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Large Language Models (LLMs) have transformed the landscape of artificial
intelligence, while their enormous size presents significant challenges in
terms of computational costs. We introduce LoRAShear, a novel efficient
approach to structurally prune LLMs and recover knowledge. Given general LLMs,
LoRAShear at first creates the dependency graphs over LoRA modules to discover
minimally removal structures and analyze the knowledge distribution. It then
proceeds progressive structured pruning on LoRA adaptors and enables inherent
knowledge transfer to better preserve the information in the redundant
structures. To recover the lost knowledge during pruning, LoRAShear
meticulously studies and proposes a dynamic fine-tuning schemes with dynamic
data adaptors to effectively narrow down the performance gap to the full
models. Numerical results demonstrate that by only using one GPU within a
couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with
only 1.0% performance degradation and significantly outperforms
state-of-the-arts. The source code will be available at
https://github.com/microsoft/lorashear
OTOV2: Automatic, Generic, User-Friendly
The existing model compression methods via structured pruning typically
require complicated multi-stage procedures. Each individual stage necessitates
numerous engineering efforts and domain-knowledge from the end-users which
prevent their wider applications onto broader scenarios. We propose the second
generation of Only-Train-Once (OTOv2), which first automatically trains and
compresses a general DNN only once from scratch to produce a more compact model
with competitive performance without fine-tuning. OTOv2 is automatic and
pluggable into various deep learning applications, and requires almost minimal
engineering efforts from the users. Methodologically, OTOv2 proposes two major
improvements: (i) Autonomy: automatically exploits the dependency of general
DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and
constructs the compressed model; and (ii) Dual Half-Space Projected Gradient
(DHSPG): a novel optimizer to more reliably solve structured-sparsity problems.
Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety
of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and
StackedUnets, the majority of which cannot be handled by other methods without
extensive handcrafting efforts. Together with benchmark datasets including
CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is
validated by performing competitively or even better than the
state-of-the-arts. The source code is available at
https://github.com/tianyic/only_train_once.Comment: Published on ICLR 2023. Remark here that a few images of dependency
graphs can not be included in arXiv due to exceeding size limi
Towards Automatic Neural Architecture Search within General Super-Networks
Existing neural architecture search (NAS) methods typically rely on
pre-specified super deep neural networks (super-networks) with handcrafted
search spaces beforehand. Such requirements make it challenging to extend them
onto general scenarios without significant human expertise and manual
intervention. To overcome the limitations, we propose the third generation of
Only-Train-Once (OTOv3). OTOv3 is perhaps the first automated system that
trains general super-networks and produces high-performing sub-networks in the
one shot manner without pretraining and fine-tuning. Technologically, OTOv3
delivers three noticeable contributions to minimize human efforts: (i)
automatic search space construction for general super-networks; (ii) a
Hierarchical Half-Space Projected Gradient (H2SPG) that leverages the
dependency graph to ensure the network validity during optimization and
reliably produces a solution with both high performance and hierarchical group
sparsity; and (iii) automatic sub-network construction based on the
super-network and the H2SPG solution. Numerically, we demonstrate the
effectiveness of OTOv3 on a variety of super-networks, including RegNet,
StackedUnets, SuperResNet, and DARTS, over benchmark datasets such as CIFAR10,
Fashion-MNIST, ImageNet, STL-10, and SVNH. The sub-networks computed by OTOv3
achieve competitive even superior performance compared to the super-networks
and other state-of-the-arts. The library will be released at
https://github.com/tianyic/only_train_once
Electrochemical reforming of ethanol with acetate Co-Production on nickel cobalt selenide nanoparticles
The energy efficiency of water electrolysis is limited by the sluggish reaction kinetics of the anodic oxygen evolution reaction (OER). To overcome this limitation, OER can be replaced by a less demanding oxidation reaction, which in the ideal scenario could be even used to generate additional valuable chemicals. Herein, we focus on the electrochemical reforming of ethanol in alkaline media to generate hydrogen at a Pt cathode and acetate as a co-product at a NiCoSe anode. We first detail the solution synthesis of a series of NiCoSe electrocatalysts. By adjusting the Ni/Co ratio, the electrocatalytic activity and selectivity for the production of acetate from ethanol are optimized. Best performances are obtained at low substitutions of Ni by Co in the cubic NiSe phase. Density function theory reveals that the Co substitution can effectively enhance the ethanol adsorption and decrease the energy barrier for its first step dehydrogenation during its conversion to acetate. However, we experimentally observe that too large amounts of Co decrease the ethanol-to-acetate Faradaic efficiency from values above 90% to just 50 %. At the optimized composition, the NiCoSe electrode delivers a stable chronoamperometry current density of up to 45 mA cm, corresponding to 1.2 A g, in a 1 M KOH + 1 M ethanol solution, with a high ethanol-to-acetate Faradaic efficiency of 82.2% at a relatively low potential, 1.50 V vs. RHE, and with an acetate production rate of 0.34 mmol cm h.This work was supported by the start-up funding at Chengdu University. It was also supported by the European Regional Development Funds and by the Spanish Ministerio de Economía y Competitividad through the project SEHTOP (ENE2016-77798-C4-3-R), MCIN/ AEI/10.13039/501100011033/ project, and NANOGEN (PID2020-116093RB-C43). X. Wang, C. Xing, X. Han, R. He, Z. Liang, and Y. Zhang are grateful for the scholarship from China Scholarship Council (CSC). X. Han and J. Arbiol acknowledge funding from Generalitat de Catalunya 2017 SGR 327. ICN2 acknowledges support from the Severo Ochoa Programme (MINECO, Grant no. SEV-2013-0295). IREC and ICN2 are funded by the CERCA Programme / Generalitat de Catalunya
A possible 250-second X-ray quasi-periodicity in the fast blue optical transient AT2018cow
The fast blue optical transients (FBOTs) are a new population of
extragalactic transients of unclear physical origin. A variety of mechanisms
have been proposed including failed supernova explosion, shock interaction with
a dense medium, young magnetar, accretion onto a compact object, and stellar
tidal disruption event, but none is conclusive. Here we report the discovery of
a possible X-ray quasi-periodicity signal with a period of 250 second (at
a significance level of 99.76%) in the brightest FBOT AT2018cow through the
analysis of XMM-Newton/PN data. The signal is independently detected at the
same frequency in the average power density spectrum from data taken from the
Swift telescope, with observations covering from 6 to 37 days after the optical
discovery, though the significance level is lower (94.26%). This suggests that
the QPO frequency may be stable over at least 1.1 10 cycles.
Assuming the 250 second QPO to be a scaled-down analogue of that
typically seen in stellar mass black holes, a black hole mass of
solar masses could be inferred. The overall X-ray
luminosity evolution could be modeled with the stellar tidal disruption by a
black hole of solar masses, providing a viable mechanism to produce
AT2018cow. Our findings suggest that other bright FBOTs may also harbor
intermediate-mass black holes.Comment: 18 pages, 10 figures. Accepted for publication in Research in
Astronomy and Astrophysic
- …